(i9)B*H®ffF/r u p) (i2)^M^fgf'iA$S (a) (iDmmm'm&n 

#W2001-215993 

(P 2001-2 1 5993 A) 
(43)&MH Wl 3^ 8 J? 10 0 (2001. 8. 10) 



(51) Int. CI. 7 




F I 


T-73-K (#*£) 


G10L 15/22 




G06F 3/16 


320 H 5D015 


G06F 3/16 


320 


G10L 3/00 


571 U 5D045 


G10L 13/00 






R 9A001 


15/10 






531 N 






USE!!* 


*fflt# »#Jg©&16 OL (£13H) 


(21)fflg## 


#S2O00-22225(P2OO0-22225) 


(71)fflHA 


000002185 










(22)ffiSB 


¥j£l2*£l /J 31 0 (2000. 1.31) 




*^fi5fi,illK^.aJII 6TB 7 §35^ 








S» 








6 TB 7 #35^ V- 
















WAV rW 








*^,S,JIIK^ l aJI|6Tg7S35^ V- 














(74)ttSA 


100082131 








#3± «8* gift 











(54) Mo^fti *f®MasB43.t^s«aa^. m^^e^^ 



(57) 

±m<Dmm^ m^mm. mwmnm> &m%;mz 



H 



ass 



-7 



2- t-msira 



3- 



a— *flH* 



4- 



5- &^tt*SP 



BEST AVAILABLE COPY 



( 2 ) 

1 

St. 

mmmmmzm-j^x, mm^-wzm-tzmiix 

imam 2] mmmmfe^mt* mmm^zhm 
~3^x, mm*-*? omzmmtz z t zmttz 
m^m i \zmmoMummmm a 

jrrs c t zftmt-rzmxm i KSE*t©*tfg$&ag 

s^cfca^^T, KriE^-— ?<D®®zmfe?%zt* 20 

g^a^g^s 

KriE!i^}t^¥a«, a&iaw^a^&a&assstefc 
*^t. mm*-*?<D®mzm%.-rz>z.tmmt-§r 
zmam 1 (cE«©*fSMas«. 

a 1 \zmm<Dttmmmmw. 
im&mi) mmmmm^^t. turn?-— m^j* 
©i»»ii:t)i^^t, mm*—*f<z>®m*m%.?z> 

zmxrn 1 tiE«©*fs«!ia^fi. 
[ffi*«9] mmm^fmmt, mmm$m\z 

zmxm 1 tE«o^ig5!ias@o 40 

x&s c t ^gtt-rss*^ 9 KiEii<D*fffi$!iag 

tntittiJijt<km®\t. mmmmm\zmmntzmm 
mmm\zm-J^T, mmmirxz±i&-rz>z\tm$i 
ttmxm 1 \zmm<D%s&mmmm, 
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2 

Mass. 
Mffifflass. 

-r 1 3 \z&m<»*im®mm. 

[»#J|il5] a— ftOftmZftofttbOttm&W 

mm*-m^xti^tiftm^<Dm^.\zm^T. mm 
mmmmmzm-i^x, mm*— v\z\&f)?%m-x 
1 6 ] a— if twMis^ff 3^*©*fisffla 

$im3--*ffr*>AJ)nntzm*)<Dmi&\z&-3\,*T. mm 
*— vommzmfev. ^o^mzmrmmmmzm 

mm^mm\z&^x. mm*— mzmjitzmtx. 

Eg 3 *ITV>5 d <!: .h-T Sfe@ffi#. 
[0 0 0 1] 

mm®®? i>s^5>si ^fs«ias@*5j; 
•5^-r^^s§«aase*cfc^M©«ia^. Mtfieie® 

[0 0 0 2] 
[0 0 0 3] 

(efts, 

[0 0 0 4] #§S$te. C0«t5ft««^*Tft$n 
fCfoOT^O. 3.—$<D$ffi<DftM\Z&-?T> A'UI- 
->3 >IZl[^cWS§5ff'5C:t7jtT^S s t'5ic-r'5 ! bO 

[0 0 0 5] 



3 

tc m *>-r s w^:*: s^iarr swax^^ a t £ «&*. a 

[0 0 0 6] «1fflfe£^B:Ctt. W*X»cb»tJv»T. 
[0 0 0 7] «flHft£¥g:lct*. 3— tffciifcu 10 

[0008] s^ic. <mt£¥ftca. a— tf©£3 

[0 0 0 9] *%ll!©*flB&!IlM£tt. 

20 

[0 0 10] *f£a,)l©*ffsaag&gl;:te, a—tf©#;S 
[0 0 11] SHtttfc^Rfctt, 3~if©#*©«»ttr 

[0012] tH*xto«^aicj4, «««at*tjv» 

[0 0 13] aaX£fc¥afctt, MNHIK£-3» 30 

[0014] m*xtt, *-^6*«*f*"b©tr*i: 

[0 0 15] *Ka©ttK*Mttaci*. tf««f«*E 
tB*X£*^afctt, Ett^RfcEttSnfcSMlMfWIK: 
[0 0 16] *%H©*M£A9gaictt. tH^Xfctb^ 

[0017] a*xw*¥afctt. a*xft^**TBi 40 
?js-&5c:£#x£3. 

[0018] sfc, trj^xttj^atm ffttum^x 

[0 0 19] *3£W©»e4!li!:&i!fctt. a-tffrSA* 

snfeffi^oftftsttm-rstt^ttm^xy^t, 3- 

**jgu *©s«££T^«f»£ttia-rasttf!tJ£ 
7,°r*j-fh. mmm^&rs^x. 3— tftctu^-r^w 

at-rs. 50 



#.H 2001-215993 
4 

[0020] *&n©Eaam*. a-ir*5XAsti 

SA**ftfcB<»l©*3&;fcSttox. a~ Sfotfttf** 

E»*nxvj*i:ts«ratr*. 
[0021] *%wo^5asg{i*5ck^ffi«is^ 

»*n©ftS:*»«!m*n, -e-oftjfeics^^T. a— tf© 
snwitfcsn*. *lt, ^©*sii#sn?>&tttf#8 

[0 0 2 2] 

[*w«>itifio»® gut *Rw*awiufc*rtss/ 

(yXfAtlt 8R©£B#s&3nc&6l/ft: 

[0 0 2 3] 1 tt, B*tt, V-f * (V-f * 

P7t» **tf7>:7BT»rt**U a— *f©** 
ft, B»fl»£LT©*W»lcE*U OBfcfcCT 

JtffibT, *-©#;&«#£, #^^2ir^-r*. 

[0 0 2 4] 2 tt, 1 & £> ©*^ 

x, 3— y©*^rsw»-r*. c©B*BttMB& » 
isifgaP3tr&j&£n3o #j»e«»2tt, 

wis. j--ifiii«waHffa58irttJ^-r^o 

[0 0 2 5] *ffgt?ag|53tt, a-1f&flMltaE«&9 

mm (ew utv»«, :*-#©«**«*•«*»« 
£#hlx> «*BBB^*s©mBB£fefc*rr« 
**«tuT©. a.-iffcHia-r&Hias:©!***^* 
u *©rt«*«rrtB««*. x±tiLB4\zm? 

*. »BB«»3tt, #MHW* 2 *»&©**« 

is-r*w*XK:**n*»^©tfca:*iiiifflL. ^©«t^ 
ft«-f*sg:«a*. a--tf««HHreswR8fc#i&-r 
■5. 

[0 0 2 6] X*ft»4»4. a— If 9 # 
6©l^«f»ffi(C*f^-r^, ^J^«x*Xh©ta^XS:^ 

rtu *©m*xfc»«r*^i«*©#M^ 

[0 0 2 7] #^Hi*»5tt, ^J^.«> 7>^<fc^X 

exit tsu xtr-**6m*rs. 

[0 0 2 8] H«A*»6t4, MX.'tf, U>X, CCD 
(Charge Coupled Device), A/D^^^Xfi|j$$ 

n. 3-if©ffi^^jiifeux. ^©js*^e.nsffii8# 
©x^ 9 J9)v?-9. m®.9-*) xibzmwmmm 
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[0 0 2 9] 4at»«A^S57«, Rffith % ttMMlWR. £a«ISA2ia&7*^©£atiff8l;:St*v> 

tffi^i*£^rfS*>^T8lJ*$ft, a— tf©|RJfl T, a-lf©g5tit©#ag&JtJrfS. ^f-;7 
Rff*. *Mf©£a«&«MI*«»U *■©«*# S4Ttt, a-if®tjt1£i8H$T$8tt. *©««©»* 



fl»&©a~ If©**©*****. »B«a»3^60 »3tt, a— If «IHlWMB8» 9 (IBIS) LTU 

6 a* &©«■*««. ^a«aA*ffl7*>6©*a*Hi »2*6©#j»«(Ms*t«rsis4SF<ptuT©. a- 

a^JWMI«Klr*8tt, *0*j£0iS*#&ft*« &M*m*9fi9£frt>. *©ft«flMt*. 

^SaflMMUfeKR"*"*. [0 0 3 9] 7f>yS6t*ViT. 

[0 0 3 1] a^*flNMIGM*9t*. a— !f©««f 4te» a— ffmtflNRGMII 9*«*»bTV»*«lHMl 

0fS©«HO««T«T*ilMlMRt«»bTV» 4x+Xh©W^3t*^fi5cb CSt*««ai*ff ^) > $ 

a. sic, ^©w^Xic^-r *-&^©w^m#s^B!E l 

[0 0 3 2] *K, 0 2©7D-^-h£#H8LT. T, #J*fflrt«5 Kftffrrs. 
0 1 ©»ffi5/^f-^©**W^ffla©8lEtUCOViTttM [0 0 4 0] $p|t] Atf 5 tt, Xfy 7S7 Kii^T. 

T*. 20 Xf|y**4*>&©#WIS«(BU Xtf-**6tt* 

[0033] a— trick o5£B3Wf toti* #j*a* -rswHwaasfri*, fta***7-r*. 

fltt, ^T7/S 1 C£V>T. t©»BSftfc#7»K: [0 0 4 1] ±J£©Jl-&(Ctt, ttiS^T.'rAC:*! 

»LT#*Aaaa*jfiu *<Dl&mn*>tlz> : gpm^ v»t, a— »f*«w6*©56eftffofcc:t*hu^iu 

a--*©***. «»fll<*ttT0*Jffll*Jk:**U, fcV)?) jWrtoftSjfeS. ■*■©•&£#«. a— !f©fgfS 

■•foaasfiw.fthu^fcbT, fgis^frSckSKi-fs:: 

[0 0 3 4] %pmm®2\t. 7*7- y 7 S 2C*WT, tfc 
*J»Aa»2frS©#JWiK:*^V>T. a— tf©** [0 0 4 2] 0%. Mfgi^xAKlfc^TWu 

sraftu *©**MUiiite. »fi«a»3Ki#i6-r 30 jjr«©«r m ^ t ic^tssrfr 5£o\z?zzt # wagr * 

•5. $6fc, #J»Battfl2tt. #JtfA*W236>&®#J» *. 0!l*.fc£» HffeA:*jffl6fc:*5lvT\ a— tf©ffi 

a— «f©#j»©«#flM8Samt, a-ifSS BHfc&*#&ftfci£ HUc. 9H(M«»&ftfcfc£© 

t»t»«M«rg58tc^«&-rs. M> Br£©^©ttlfcW»SftfciS'b£t») ^ * 

[0 0 3 5] *0*. xy-y^S'SK**, a— tTjSffif a«aX*»7K:4Jlr»T, Bf3£©±a««3ft«»6ftfca: 

«ai«»9tfiy$$ftTi»«««flt«*Mff-r*Jp« sic «eftff-5J:5fc"r*ct'bw«T*s. se> 

£fr3fta#f? toftS. fc. fl*.tf. a-tf«1Ml«ia9»9fc*#Sftfc«W 

[0 0 3 6] W6, 3Tlt #B*9*3 fM|3WJfje©lttt±*fcttaTfcttt>fct*fc. fSISS 

tt, #J»Blll»2^6©a-'!r©*^©*^IBI»«*« fT-5«fc-5C-f*££ ! bW!ET!i&.-5. £ft£©»£«. J# 

KS^T. *flWIMl*H«f'ra©KfflV»«. ±m<D®. IS^Xf-A**, a— tffcBbjWt. *©«»fta— tw 

*i«ft#**«Mi«je*««B«ifla*fft>. -t© 40 e-t^t, tmwfrt>ti&z.±\ziiz. 

a— tf«1»flHRXirtf8trftttT%. $6 [0 0 4 3] *C, 0 3te. H l ©#tfgttffi 2 ©tt£ 

fc. Xf7 7S3fH BH*A2l»6tt. a— tf©B& «£SLT^S. 

1§&LT> 9B**«£fcaH*A*4Ba£fT^. *© [0 0 4 4] §J*A*»l*»S©#J*fc<ltt. AD (Ana 

mmmm. ^-ifmms&im&m&tz. t log Digtaog&as 1 1 t«»sfta«k-5CftoT4j 

fc, 7f7/S3T«, ^att^A^gP7tt. a-if© 0, ADMi lit. -t©**^*. T+uVQA 

*a«Nit#*4aiiwiA*JMi*ffv>» -e©*a«« j^s^^jMt^fcXiiu *©*ww»6ft«#^y 

*, a-tr««M»«3E«r»8Kttift-r*. -^t, ^mtttagui 2 ttwatuwi 2 

[0 0 3 7] a— «f«HMWI3l«rtf 8 tt, Xf-y/S4 tt, HJ>60t)»f-?l:oif>T. 

tcti^T. #>»BiW2^6©a— »f©*J»©IWMMI 7 l/-ACrtK*«a«*lW!:tT. «Atf. 

»IHTa»3^6©«fcS:ilf*. B«A*»6^6© 50 h;K\ M^MflMfc. !r^h5AK«. *^ y <^> 



[0 03 0] a— tf *flM»«3e«f* 8 tt, W^S^gP 2 



[0038] ^©^. xry^ssiw^uT, #sea 



( 

7 

Jlotf. MFCCflllel Frequency Cepstrum Coefficient) 

-rs. 

[0 0 4 5] $S$«ltB6B 1 2 tt. 

try^*i». /*7-«©«*«as. 

*Miaff»8fcflttW*. &*(. %SiiS<tbTtt. #J 

[0 0 4 6] Vy^>ygBl 3tt. mHAlttffl2»6 
^-7. 1 4 , SWx-^-X 1 5 . fcitTCSix-* 

<a*w &B»-r*. 

[0 0 4 7] ff%. fStf^f-^^-^ 1414, # 

j*BBT*#j&©*BK:m*3B* ©#$*#»££© 

*»W&tfB*BT*» ; Ex>l'*EBbTir»*. zz. 
T, WBtT^tUTtt. HMM(Hidden Mark 

ov Model)fc£*BW5J:iatT**. Stf-i'A- 
BB8«©##BfctH»T. *-©3MfK:H-f 
*B»jfl^3hfc#B8»SE«LTV»*. Ifcftx- 
^-Xl 6tt, Stf-5"<-7 1 5©JMf»Bfc£ 

mmmtvTK. xm&&xm (cfo 

PSG (Head-driven Phrase Structure Grammar) (£ 

&BB<g«&£&) . wm&ts.&mmm& (N- gr 

[0 0 4 8] V.y5P>yg&l 3tt, Stf-i"<-Xl 

^-xi 4tfEig$nxv^ : i : # ; tx;i'^^-r?>^<?: 

1 6 KEtt3ftfc£BBBiJ*#BT*::fc 
fc±0tt«U ^©J:3fcbT*»Snfc*B : ET-H'* 

l:<t^T, a— tfro^ftBBfr*. 
[0 0 4 9] -ttT. Ty^^g&l SfciStfJ&BB 

77§t, *fffiwaa5 3{ca^^n*. 

[0 0 5 0] B413. @l©*fMffagP3©<ffi)t 

[0 0 5 1] *MgB»2#afiraa--»f©#MBB 

D. Sffi«aag|52 1«, ->V-7Xf-^-^2 3 
*\ mSffiafflr-^-7 2 4, HEr-^^-7 2 
5*&BCfrxCT#BLft#£, #7HRttH%$lfii! 

u *©**BB»*#£-r***BjS:*. *ftsmag& 
2 2\zvmtz. 

[0 0 5 2] m%s •>V-7Xf-5"<-X2 3Cll 
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#B£. *©«&fc*?T»BBBK»»L&->V-7 

^jWE*snxfeo, wm5aa^2i«, co->v-7 
xMtiutiaD. ff*BBiS*fc£*na¥B 
©b&sbr-t*. 

[0 0 5 3] CdT, ->7-77>tl/Ttt, B*tt, H 

[0054] *m«aafflx-^^-7 2 4 
©gE^&gfc&ratiiffifc namm wtmmm t , 

B^fcHT*«tt«*^3nfc#tt/**^**E 

*3*lT*D, ffB«L3«2 1». *©JMHMK>B£ 

ommmmmzm. mz, m&mmm m. *© 

«&*-*-*£#©©«&*, *^Bii^*©ji«w^©a 

20 *, »B»a»2 2fcHja*-*. 

[0055] ££-c, nrnmmU2 itii ba«. ie 
axs*. XRaaxft. hpsg, #cft-w&¥fsaB 
«#*»v»t, «x««r^»ftrt«©a»*fT5it3j« 

[0056] sfc. bb£bs<2 n*. jbrkukut, 

®Kx-^<-7 2 5t)#Slb^e»«ia€:fi : '5. EP 
5, Il7-5"<-X 2 5 fete, a— lftWHBbfc*^f 
©*J*BBtt»a:, *©56BK»UT. tffiy^fA^ 
tli*LfcJ&*ifflft*. *5WJ. *f©->7^A©HJ7J 
30 ^©ttiJJtCttbT, JL~!f*«»Blxfc#*©lf*B 
B»££©B$©$T, a— tftWB^x^AtoB© 
*rfS©Bffi (*#BBK) #EB£n*«k3fcfcoTi3 
0 . Wfl&aSB 2 1 tt. #BBB*#B-9"« -<tT, W 

pm®.mm\zmz3im®e>m j $>> B£&ae©£*r 

[0 0 5 7] yV-7Xf-^-X2 3Mtf 

Mf§5aafflx-^^-x 2 4 tEBsnx^sttatt. 

ti.-**^tt, *fB->7T-A^, a— »ffc»UT«6*» 
OtiiJVZffot, *»1"*»B«ia»2 2fc±-pTJE« 

[0 0 5 8] ±atfcj;5ic, ws«aaa52 1 ->v 

-7^7 f -^^-^2 3*#Wr*JltT, #^BB«S 

***j«r***B (B*) ©«**ami-**«. ^© 

50 St^, !i1t^-r i b©Tfe*<!:^, ^©!g5t»$rgT«t 
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IfcMMIfcbT. a-if*WB«H«»8£BB 

«nn&B-r«*fcB-rs¥B#. **B»<s»;:S*n 
StBskhkbts. 

[0 0 5 9] TOsaagP2 1 a, BPBttttJitfc 

[0 0 6 0] W6, a-if«BB«JE«»8tt. ±ifc© 
£51;:, a— tf©«*©ttl8*J(t3rr**«. *©«Blc 
*fc^Tlt B*BMti££*&ft«*B©«&*«tt 

&W«'b8:fcS:o»£#**. *#Wfctt. BA«. a 
fi">7fAC*W, a— lf*B#T*J:3&RBSfr 

o a— ifjwssct^asna. £©fc 

©, BB»S»2itt. MMM t LTBtStiTV** 
»B**xA©ai©-fcS*n*¥B©«sa:«« , b. ->v 

(ac t ion (Quest ion (date, star t.tiie, 
(date ???) 
(start_time 
(end_time 
(channel 



[0 0 6 5] u^T, (1) ©->:*- 'J :frl::«fcntf, 

mmm i t«fc«frwaais*3&*. »«©»#£«**> 

©T*S»£K:tt. #B»S»2 2fc:fcVVT\ BBSfr 
pB#, BB&BBTSBB. BBSB7T5BB. © 
BSfT33 L *>*;Hfc. ^©<fc-5^«S#TM^T-5g© 
If X exist then speak (Y) # X 
«3& ftjWfcbVi©) t (X Y) 

[0 0 6 8] CI HT, (2) ©5/^-U^KJ:n«, *S 

mmm i izj^bbabbbic. rs&j 
»Bjaaa2 2ti5^T, r*Bflt-nvtv»*©j tv» 

KH*ff'5&©rt*«l«*«4iaE*n*. 
[0 0 6 9] #BB£»2 2tt, B*tf, SBM 

s»2 ia»&©«B»a«»;EttT&<. a— 9*«w* 

fflTa^utfftftjrr*. EP-fe. ba«. BBjaaBP 
2 i*>£©mBft3&*flf. a— awBBsufciifcfc 



[0 0 6 1] Me»l9ff2 2«. WB»B»2 1*$© 

masts*. *«ktf a— *f«wttnefi» 9 

«Bx-*^-.X2 5-t>->^-U*r-^-^2 6 
iUT©, a— »ffcffla-rsma:*©l'3*S£jau * 

10 [0 0 6 2] W6. •>tU*r-i"<-X2 6H B*. 
«, a— !ft©*fg/^->©ffiai]tUT©->±U* 

ftTWfcs^y sb$hsb2 l^eeti 

[0 0 6 3] b*«. tw^®ia<i 

20 ^Btt^nxvia. 

[0 0 64] 

end_time, channel))) 
f0# 
???)«H&R#aj 
???) AK7BB 
???) &*>*)l 

• • • (1) 

30 [0 0 6 6] it, b*«, muvjmwtm (v»toB* 

. £B) Sff5fc©©->^U^tbTtt, &«J:3&t>© 

#BB3*vrt>«. 

[0 0 6 7] 



• • • (2) 

t*«UT^*t*ictt. ttmmwU2 2\t, a— ye 

r^A/tc-^ttj t, #KBB*B-r->^ 'J *©Bffl*»t 
B*K, B*tf* fiII32 1*6© 
BBBHB*a«. a— lfjWWi*Lfc!:i*atTV>* 

3&«*T**it*BUT^5t€rk:tt. »BB3!B2 2 

«. a— *ffc rfa#av»n£j&*aofc©T-*"*»?j &n 

[0 0 7 0] ■>t'Jtr-i"<-X2 6l;tt, > 

50 ^-'J^©{fi. a— !ft»B*ff , 5K»fcoT©-B«& 
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6idtt, mtf. m&atf 2 1 Kck3flig&g«£3i 
^-x2 6ttt, mmizm-rznm c k tr 

[0 0 7 1] S&fC, *ffSMSSP2 2tt, ?Bfi9tf2 

i*s©wmls*s*-*, a#*«*j*L&rt*«f«. s 

Sttt, *©l*l««||*±jaE"r-B©lCjBV>fc^U*K 10 
■"T«IMI»ft, ttflllMfcLT, 2 

[0 0 7 2] Sfc. *ttS8L9flS2 2tt, jBKtJBGT. 

[0073] astt. mo-x&mAomim 

SOUTHS. 
[0 0 7 4] hX^fi)c»3 1 ttt. *fSSffaSI5 3 

*5l*je<lMl!8«tt»$n*«t"5Cft:t>T*D, h 20 

X£fi)ig|S3 ltt, ifcWCJfcUT, 4 
*<t^±E)c3tfex-iS"<-X3 5*«Hb&a*& v W* 

•5. 

[0 0 7 5] Wfe, gff-!S'^-X3 4l:H 

oammk*, 7^-fe>h*©tMi*«Eifisnfc 

sett. m«t©«©?->7v-K m^ist 
*£jfcTs©fc#»fcJiMB ©gjisiy*. sft©M«ifl» . 
««©£jMX£flu»neftsnTv>ft. ^lt, 7* 30 

*. 5r+^ 3 1 «, ^ffl^ffiffil'J 

[0 0 7 6] *&. HX^Jdi»3 lfctt, a-1f 

ilMMIfcil^^T, a^x©*^**^*. SP*>, 4 40 

JdUBXasx-*"*-* 3 5 fctt, |5]-|*3#©x>7l/- 
fT, *S©J|feS i b©*«E**tlT*0» f^W 
*ric»3itt. *©Jc5ftffi-rtg©7>7V-h* 

6, m£©ssi©*>©£. «*«at:a^ir»TaiR-r 

TttftSMlCOHT'b, fllr5t©g31©fc©^. Sfftf 

«fc*^ir»TaiR-r«. set. 7*7. h**^ 3 1 
[0077] jintio, tMMHitf. r» 

•3 J r3KU*j ©U«^JU*«*T»*Ct**bT^« 50 
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*\ r&b$j * rg^j ©k^;p*»*T»*!:i*aE 
«j>a57ft3^©a*x*«***n*. 

[0 0 7 8] ft*. Hi*X©to3t©*tttLTtt, 7> 
[0 0 7 9] h££jdc*3 1 tt, 

*t, *©»tt3MWf^«x«p*r**ffv». mmomn 

*.«, #-X©ttiI*\ 7£-fc>h*«ktfl*>h*-:> 

3 >««frr*fc«©im*©«tf!>iMMNH>. #ms 

[0 0 8 0] 7*7 hX*«»3 1 T#6ftfcfiN|tt, 
StI«£fiKSS3 2l;««&^n, jgfiU£fiS»3 2T«, #* 
frx-^^-X3 6 £fflV>T, 7*7. h£±fi!cg&3 1 \Z 

^v^T^issnfcffl^xtwjs-r*^**©*^-^ 

[0 0 8 1] fip-6, ttH-f-^^-X3 6ICll fl* 
tf, CV (Consonant, Vowel)^\ VCV, CVC^<DJ^ 
T^^M-x-^d^EISSnTfe 0 . gfttefttf 3 2 tt, 
7*XK*£fc*3 l*&©1MRfc3S"^T, 
3lJt7*-7£&$U 7^-fe>h, -f 

^eJcgP3 iT*jat*nfca*3tt»js-r*^jat*©#^r 

[0 0 8 2] «IiJ^g|5 3 2 tcfi, i-lfSRttMff 

fc. JtftJIlvifclH 1 ©^*©*^?-^******! 

ttD, *U-t5ft»} t ©^til*©#^r-^3j«ifejaE*n 
[0 0 8 3] &*s. ««ft#?»t©HffC:t3lriTtt. 0>J 

miSimStM 1 - 3 - 1 0 . pp. 381-384, 9 m 9 « 

[0 0 8 4] %BiJ^jS«3 2T#6tlfc^^©^x 
-^tt> DA(Digital Analog)^SU3 3 CflMftStt. 
^CT, 7^D^flr*t^T©#^fll^fc«**tl*. 
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to o 8 6] %Fw®.®2j!)WJ]-rz>®.m&mtmm$ K7y*K:bfca^>T. ^aataaus i^sowi^ 

ssMaaunc #K*9as3tftta?«**iimttii h;i/*^*>ji*Htu (3-f?v*©3 

[0 0 8 7] ttflMftfMVIflU ltt, *CK:ttlft*n* [0 0 9 6] ???>?V5 3tt. hJI/*^<tlS5 

U *©»jetl*tbT©«MMIt. Hfr«!ia8P4 5 RgbT. flOMMNRtf. *A/*n»*tt«. So 

[0 0 8 8] ft*, 3.-if©#*©iMMiww»&. '-to %oiri-rnoa;is©tt^t*ij-sffi0'b©T*s^0T 

a-if©*t$ifcrr-5:**fcbTtt, y5 L >^*ff5. 

1 0-5 5 1 9 4#4«|K:E«SnT^* l b©«*jBt» [0 0 9 7] Wfe. 5 5 fctt, gA, 

*H£j&fi?fl6T*S. TV»*ttlM\ ^T^*m lV»TH4ttl, Cb 

[0 0 8 9] tb&flHRfiaflU 2 Ji. fUfciMfrSnS A/TV»Stt»P©. #««©#££* tt*IS©lBtti£fll 
«fcg;«fa*«ia-rs^ii'«fc0, a— «f©*1t&«3e 20 vvcSsgSfrSiitfc.fcDasnfc. Mfl:*tt5l 

b, *©*£«*£ bT©««**£. Hfr«iaa54 5 dOt^T©*^ (HMM) aqHi**rO»*. 

KU*-r*. W"6. »S:««»a*4 2tt. JK&flHRlC [0 0 9 8] fLt, 5 3 tt. bMk 

So't. r»ocj ^> reoj : Ht»5 2*s#6n*5'>#;nR5ii*»«iisn*** 

v»TfflSbfcma»t**'>>h"r-B. ^bx, «EdHt -7^^>^as5 3 a, -t©^x;ncj*j6-rs«»*. 3. 

«&aflU2tt. *©HflUMEfc*3V»T, 3.~if©* — »T©*«£bT!iJ£b. *-©JtJSI6*£bT©«1lHt 

itSltJtb, *©«£«*£ bT©jatfiHMIS. Xfrfl Sff«L9S54 5 CU***. 

a»4 5HW-*-r*. [0 0 9 9] V«;/^>^gB 5 3 tCfc^T, ^ 

[0 0 9 0] B*flftR&affi4 3tt, fCtftftStl* h*»?ft$5 2*»6#6n*-»#^*w«««»sn 

*m«flifa*fta"r*i:tt:j:p. i~tf©*«ft*£ 30 agwrttifffcn*. bps. «*. 

U *©«3ett«tbT©««HMI*, JHM&SW4 5 «, «tf«3- H^yi' ft«V»T^ HMHKfcfcfrS 

dfctj:»3#&nfc-»#;u3R?!i3j»«jiisn***©n- 

[0091] w-6, 0 7ti. B6©HftflffRfflatf4 3 jwi, s^^*««©«©««i*fflv»T*a**fffcn 

©«|jc«*«bTV>*. fcHMM (gttfBHMM) fcffl^Tfirfctl-S. W 

[0092] mmmmmt* ®®.mm®5 in*** 8o«3-h^^*«v»t^^ h;i/*Htsff 

n. ffttttitutts it*. *©jmftiiHR©<iMMt£atij 5nticj:o#6nfc*»3)?;i/*5>i#ii»sn***© 

KWavelet)«»b, SIHMftft^«S ttfcHMM (SOfflHMM) *JflV»Tjffe*l*. 

-r«»ft3>#-*>htr*««^h;nfe#T. * to 100] ±jz&©j;oicbT. am**** 

^Hji*Ffl:»5 2 tc«iji&-rs„ 40 e>> ««*ieer**i6t:o^Ttt. sp, * 

[0 0 9 3] ^£ h;i«Ht»5 2«, -j-H^y^x n^^37*xJi'fcJ: «•»■**>&© 

-^-7,5 4fce*anfca-H^yi'fcbfc«»o SttfBttJ . fl/k'v3>^, VOL.49, no. 8, pp.1 

T. ftfttAmMS 1*»6©««^ hJMB"** 060-1067, 1995^8£glC. *©#IBW«IE**nT^ 

fcb. ^tllCctO. 13fc7C©-»jftfc (*J) 4. 

[00 94] ws. 3-^7?f-^^-x5 4i: [0101] MBftttNt^e. ^f»^iiit-rs^ 

«, *b^TV>*«ft*©. ft«i»©*l8K*Jtt«ii© *«*£*"3<«*«*©*»Mtt*J . ^2@atgffi 

H*k*ffl^T*H*ff^ ^tt*0#6nfc3-t«^y a^x-f 7-»^vfAH^, pp. 75-82, 1 9 9 6^ 

©4fc©ld. *^ffl3-Fyyi^^»D«3- 50 fc*. 



[0 0 8 5] ^{c, 0 6 tt, @l ©3—tf®fiHtajESr 



[0 0 9 5] ^bT. ^h^»^ft:»5 2tt. 3-H 

4idEft^nfc4s^z:t©3- 
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[0102] h6£rd. ±m^mmmm 4 4 tt. ZZ. 

«tfr£«£U UTOtfflWNIfc. JEff 

«*^fflw©^«Hi*©*ii***Mk*. *s 

[0103] jbrjumm 5 «, n»*#®aa54 1 . 
tE*«fMaa»4 2, HftHNRfflaau 3 . &«ktf£a io 
ra^a® 4 4 *> & ©«nn**efc£»fcji v»t. a- 
tf««n»aiE»»9 icfiy#$nxu*jsifint**Mff-r 

SftlSWfcMffttS**. ■*©3E§fttfc«k-3T. a— *f 
&tf1tiaRft®9©$*fitfg£IIS8r-f£. BPS, M3t*& 
aSP 4 Stt, «*K, »»ttf«Bl8«4 l , S&tilllM 

sau 2 , Hftw«fl9S4 3 , ^mmmmmm 4* 
mni«*iB-r«. ^lt, xsH&aaustt. ^©« 

*W&**flHBfc.fcoT. a-if«*W«l2»«9©« 20 

[0 10 4] @8«. a— tf*fllf«afaft»9 

*«lMHItt. -€-©*1»©*#^*. 0 7551 

©«H©fiare3E-r *>©■*?, nwycsnag, 
win (itat/hSneE, *©tf«w«»v») z\t%m 
To sffmaaP4 5T«, £©j:3tt«flHiH&£UT© 

[0 10 5] JfcK. H9©7D-3 1 ir-h*#JHLT, 

E6©i-if^t»««Hira58©jaa usnmasm 30 

a) K0V»TR»t5. 
[0 10 6] S-r*8HC» 7.x-y7"S 1 1 |C43V»T. « 

fMHwas!»4 i, m^t»^9SP4 2, H&titf&a&a 

IT. a— tf©«WS*SEU. ^©It^iSS^tLT©^ 

[0 10 7] HSfMaSP4 5tt, X^yfS 1 
T, ttft1imM4 1, ftMHIft8tf4 2. H«1t 

ig&aau 3, *5«fc^a«affla»4 4^6©siiW» 

xy7"S 1 3 Hit*, -5-©MfrflitJ:oT. a— tf«1» 
*3Sf5ft^9©$*ittiN8£MfrbT> ftS**7Ta.- 

[0108] ±aiLfc-a©ma«. iwioa- 

i«9x7fcJ:0ff"5J:tt>T*«L. 77 h7x7JC«t 
OfrS^fcfeTS*. H|Oftl«V7hf>i7lCj:o 
Tff"5«*Kltt, t©V7h^i7«nyD^7 

[0 10 9] ^£T<. fllOtt. ±iEbfc-«©*&S£ 
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©-3*at©»ll©«J««*wtTV>*. 
[0 110] 7D^5Alt 3>lfa-^k:rtJR*tlT 
^£EftJ&ft£LT©A-- Fx-YX? 1 0 5tROMl 

0 3 Kx&Eftb-Cfc < C 

[0 111] &<5W2*;fc. ZTUifyh\t. - 7DsrK- 
f-fXi', CD-ROM (Compact Disc Read Only Memory), M 
0(Magnelo opticaDxV DVD(Digital Versatile 
Disc), iff#^t'J&£oiJA-/^ 
JHEftJtftl 1 lie, -^M»5V»tt*ttWlCttiW (IB 
ft) bT*<Ct*«T#*. £©«fc7ft'JA-A7;|/flE 
ft$#l 1 ltt. V»to«>*Ayy-S?y7h^X7iU 

[0 112] fc*. 7"Di/7AH ±MUfc«kP^'JA 
-A^HEftftftl 1 li^h^y\f.3.-9\Z^>7, Y— 

fcO, LAN (Local Area Network), <f>£— 

U 3>fcf:i-*Ttt, -£-©<fc7fcLT!|£i££nT<5 

7W;/A£. ftffi&i 0 sts«u rarr«/v-p 

[0 113] 3>tTj.— CPlKCentral Processing 
Uni t) 1 0 2 SrtftLTV>S. CPU 1 0 2 AT. 1 
OlSr^LT, AtnM>^7i-X110*igt$n 
TfcD. CPU1 0 2H, AttJ*-f >^7i-7, 1 1 0* 

SnSA^SRl 0 7dt«^Sn4^t(C«kO«^A* 
^n-5t. *nKl/fe^9T, R0M(Read Only Memory) 

1 o 3KteiW$nT^.*7o^A*iifrr*. 

tt. CPU 1 0 2 tt. A- Ff^ X* l 0 5 Cfttt 

SnTli-S7n^5A, ftftirt<tt^9 i^*6 

teai^n, am® 1 o 8 t?s#s*ita- Hxw i 

1 0 9 IdgtfSttfc U i>-A*7*;Haft&# 1 l l 

*m$nTA-Hf^7? i o 5 tc-r >7.h-;i/snfc 

yntfy&Z:, RAM (Random Access Memory) 1 04ICD 
-HLTUffTS. HtlfCk?), CPU10 2U. ±J6U 
fc7D-5 lj f-hiiL^ofc5!La, *«l>tt±JEU& 
7n-7^@©«^{'ct0fT^n-5Ma^ff7. ^LT, 

cpuio2tt. ^coma^*. «sfc*i;T, 

tf, Atb^-T>^7x-7 1 1 O^^LT. LCDCLiquid 
Crystal Display)4>X e-*^T«^$n-5Hi^SP 1 

0 6#&tttf). ftfiftl o 8^e>aim, 

iCtt. A-Kr^T,^ 1 0 5tC|Bft^$*S„ 
[0 1 14] CLZIT. *W»«H*^T. a>¥a.-5> 
\z&m<0&M&ftt>iiZtz&<D7a?7&&mj&fZ98 l 
lXfy7B. ^'Tt ! b7D-^ J f-h<i:tTE«^n 
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[0 115] ZfUtfy&te, 1<DZ1>\±3.-?\Z 

[0 116] £Lh©ck-5»C, '>&< tb, a— tf©#j£ 

tmigi%:\z<£$Ltizm*)<Dm&\zm-3teT, a— *©« 
m*&fctz&^\zhit<D~z. vcsmm8.$i<» a— tf 
©igttfctfnrrs 5. se>i- ^©<i. 

[0 1 1 7] fcti. 3WB«©»«Ttt v 1 IC 

BUSfrfcffc, ■€■©**«, «l*PPV»TV»*# 

mu *©ttmit*Kisd!v»T, a— »f©jg!m^it3£-r 
t*«a«LTiftaanfc»*ictt. a— *tjwbotv»* 

©±3&»eit*fcafr3V»T. rjgo j * TRgj 
T*1MWI©*S**<*-*J:3&» 7F*y*0d ho 

[0118] sstc. »e«a»3ic*v»Ttt. 

a— if t'»rs*B©ia**«fl:*-fr* - £*n<r«T* 

«^.tf, ffitf&©Bft***Lfc *).-.*© 
Ml. *MB^y/»*6©«KH**Jt<»UT. 
fc, a— tf£©JtfBSfr5±5fcrS££jj*5ItET* 
So *&. a— tf**ftU*"5&tt*t»*«^ 

fctt, »e->X5 t A*6©3EKia**J*6bT. a— tf 
l $ i; s j: 5 c-f S £ £ *ti«ET* 

-So 

[0 119] 9Sfc» ***©»li"Ctt. a— BF*6©* 
J»**J»B«U *©#W«U6*fc»T*JE*£UT 
©fSB*f?3J:3fcU&**, *©«. «*.«. a— »f# 

[0120] sec. xmnvmrnm, a—ytatT 
[0121] *&mt, fayKi; 
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o#y h^«ka— tftc9M©a— y-r >^7i-XiL 
Tffl^5Ctd<pIfiET*S. ;:©*£. a— tftttf* 

aw***?^^©******^?), nxyhizm 

[0 12 2] 

[»«©»*] *Rw©»effl3*ii*s«ttf»Bfc32r 

10 ft. MtfKfE»«S(*K:<kn«> a— tfjfc&xastlfcR 

^©«**«ttffisn. *©«e*ic*^v>t. a— y©$ 
t*#*£2ns. ^lt. {■©ts«»en«jSttiiNi^ 

*"3^T. a-tffcBflT5mrt£a*±JdtSn*. fto 
T. a— y©B1f ©£££<*: t>T. 0M.«. A'Ux-v 

[01] **W*j«ibfc«B->^7 t A©^l«©»« 
©*i*i*«t^Dy^BI"P*5. 
[02] 01©*f|gv7.5 1 A©5aa^l5iW-r-5fc«)©7 
20 D-y-*-hT*fc. 

[0 3] Hl©*PBft82©flMffl*^r:/Dy?H 

[04] Hl©«««a*3©«rt«l*«f^Dy^BI 

[05] Hl©X4|jt»4©«|jt«**r^DyirHT 

[06] 01 <D3.-*fm®mmwmB 8 ©baa 

[071 06 ©Bftftttfi9tt4 3 ©#*«**r^D 
30 y£BT?&*. 

[08] BflHMISjRTBT**. 

[09] 06 ©a— 9mtMKxm»8oim*Km-r 
[010] *B«£BflIL&:3>i;:i-*©--**©# 

ji©*jaE«*«-r^D ? * 0t*-5„ 

[«F#©KW] 

i #*a*». 2 #7»bmb. 3 *»Bea 
. «k. 4 x±fi)c». 5 #*ma'«. 6 H«tA^ 
s. 7 ^ais^A^sp, s a-ifdmiNueR 
40 m, 9 a-mmmcMR. 1 1 ad^si 

g|5, 1 2 #?Stttag5, 1 3 -?v=f->9®>, 1 
4 WS^xJPx-^^-T., 1 5 

7., 1 6 2 1 mB^asu. 

2 2 ttiS&a&P, 2 3 yV-7^f-5"<- 
7>, 2 4 i"ig2fiafflx-^-7>, 2 5 affix 
-^^-X, 2 6 3 1 

x^hX^gP, 3 2 *£iij£fi)cg&, 3 3 DA 
^JftgP. 3 4 S?*x-^-*, 3 5 

3 6 4 1 

so mmmwmm. 42 tE^wwintf. 43 n 
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m. 4 4Ssmmum». 51 

5 2 "Cjr-hJHHHtiB. 5 3Vy^>^ 
ff, 5 4 □ -FTV^t-^-X, 5 5 HM 
Mf-^-^, 10 1 AT., 1 0 2 CPU, 
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103 ROM, 10 4 RAM, 105 A-Fx4X 
9, 106 107 108 

mm, 109 H9-r^. iio xm^y? 
7i-7, iii u A-zt^jnaww* 
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(54) [Title of the Invention] Interactive Processing device and 
method, and recording medium 

(57) [Summary] 



on feelings of a user. 

[Solving Means] A voice recognizing portion 2 recognizes voice of 
a user and extracts rhythm information of the voice . An interaction 
controlling portion 3 extracts concept information of words and 
phrases contained in the voice recognition results from the voice 
recognizing portion 2. An image inputting portion 6 captures an 
image of a face of the user and outputs face image information. 
A physiological information inputting portion 7 detects 
physiological information such as a pulse count of the user. Then, 
a user's feeling information updating portion 8 estimates a feeling 
of the user based on the rhythm information, concept information, 
face image information, and physiological information. The 
interaction controlling portion 3 and a sentence generating portion 
4 generate an output sentence to be outputted to the user based 



[Object] To make an interaction in a wide variety of forms depending 



l 



on the estimation results for the feeling. 



2 



[Scope of Claims] 

[Claim 1] An interactive processing device for making an interaction 
with a user, characterized by comprising: 

concept extracting means for extracting a concept of words 
and phrases inputted from a user; 

feeling estimating means for. estimating a feeling of the user 
based on the concept of the words and phrases inputted from the 
user to output feeling information expressing the feeling; and 

output sentence generating means for generating an output 
sentence to be outputted to the user based on the feeling information . 
[Claim 2] The interactive processing device according to claim 1, 
characterized in that the feeling estimating means estimates the 
feeling of the user based on the concept and the output sentence. 
[Claim 3] The interactive processing device according to claim 1, 
characterized in -that the feeling estimating means estimates the 
feeling of the user based on the concept and an image obtained by 
capturing an image of the user. 

[Claim 4] The interactive processing device according to claim 1, 
characterized in that the feeling estimating means estimates the 
feeling of the user based on the concept and a physiological phenomenon 
of the user. 

[Claim 5] The interactive processing device according to claim 1, 
further comprising sound processing means for processing a sound 
signal obtained from an outside, 



characterized in that the feeling estimating means estimates 
the feeling of the user based on the concept and processing results 
of the sound processing means . 

[Claim 6] The interactive processing device according to claim 1, 
further comprising voice recognizing means for recognizing voice 
of the user, 

characterized in that the concept extracting means extracts 
a concept of words and phrases contained in voice recognition results 
for the voice of the user. 

[Claim 7] The interactive processing device according to claim §, 
characterized in that the feeling estimating means estimates the 
feeling of the user based on the concept and rhythm information 
of the voice of the user. 

[Claim 8] The interactive processing device according to claim 1, 
characterized in that the output sentence generating means changes 
expression of the output sentence based on the feeling information. 
[Claim 9] The interactive processing device according to claim 1, 
characterized in that the output sentence generating means changes 
the number of the output sentences based on the feeling information. 
[Claim 10] The interactive processing device according to claim 
9, characterized in that the output sentence means back-channel 
feedback. 

[Claim 11] The interactive processing device according to claim 
1> further comprising storage means for storing the feeling 

4 



information, 

characterized in that the output sentence generating means 
generates the output sentence based on the feeling information stored 
in the storage means. 

[Claim 12] The interactive processing device according to c^aim 
1, characterized by further comprising output sentence outputting 
means for outputting the output sentence. 

[Claim 13] The interactive processing device according to claim 

12, characterized in that the output sentence outputting means 
outputs the output sentence as synthetic tone. 

[Claim 14] The interactive processing device according to claim 

13, characterized in that the output sentence outputting means 
controls a rhythm of the synthetic tone based on the feeling 
information. 

[Claim 15] An interactive processingmethod for making an interaction 
with a user, characterized by comprising: 

a concept extracting step of extracting a concept of words 
and phrases inputted from a user; 

a feeling estimating step of estimating a feeling of the user 
based on the concept of the words and phrases inputted from the 
user to output feeling information expressing the feeling; and 

an output sentence generating step of generating an output 
sentence to be outputted to the user based on the feeling information . 
[Claim 16] A recording medium characterized by storing a program 

' . 5 



for causing a computer to execute an interactive processing for 
making an interaction with a user, the program comprising: 

a concept extracting step of extracting a concept of words 
and phrases inputted from a user; 

a ifeeling estimating step of estimating a feeling of the user 
based on the concept of the words and phrases inputted from the 
user to output feeling information expressing the feeling; and 

an output sentence generating step of generating an output 
sentence to be outputted to the user based on the feeling information . 
[Detailed Description of the Invention] 
[0001] 

[Technical Field to which the Invention belongs] The present 
invention relates to an interactive processing device and method, 
and a recording medium, and more particularly to an interactive 
processing device and method, and a recording medium for allowing 
an interaction that reflects, for example, a feeling of a user, 
[0002] 

[Prior Art] In a so-called interactive system, when an input is 
made from a user, a response sentence corresponding to semantic 
contents of the input is generated to be outputted. 
[0003] 

[Problems to be solved by the Invention] Thus, in the conventional 
interactive system, irrespective of a feeling of a user, the same 
response sentence is outputted as long as the input has the same 
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semantic contents. As a result, the same interaction is made. 
[0004] The present invention has been made in the light of such 
circumferences, and thus allows an interaction in a wide variety 
of forms depending on feelings of a user. 
[0005]. 

[Means for solving the problem] 

An interactive processing device according to the present 
invention is characterized by including: concept extracting means 
for extracting a concept of words and phrases inputted from a user; 
feeling estimating means for estimating a feeling of the user based 
on the concept of the words and phrases inputted from the user to 
output feeling information expressing the feeling; and output 
sentence generating means for generating an output sentence to be 
outputted to the user based on the feeling information. 
[0006] The feeling estimating means may estimate the feeling of 
the user based on the concept and the output sentence. 
[0007] Further, the feeling estimating means may estimate the feeling 
of the user based on the concept and an image obtained by capturing 
an image of the user. 

[0008] Further, the feeling estimating means may estimate the feeling 
of the user based on the concept and a physiological phenomenon 
of the user. 

[0009] The interactive processing device according to the present 
invention may further include sound processing means for processing 
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a sound signal obtained from an outside , in which the feeling 
estimating means may estimate the feeling of the user based on the 
concept and processing results of the sound processing means. 
[0010] The interactive processing device according to the present 
invention may further include voice recognizing means for 
recognizing voice of the user, in which the concept extracting means 
may extract a concept of words and phrases contained in voice 
recognition results for the voice of the user. 

[0011] The feeling estimating means may estimate the feeling of 
the user based on the concept and rhythm information of the voice 
of the user. 

[0012] The output sentence generating means ma^ change expression 
of the output sentence based on the feeling information. 
[0013] The output sentence generating means may change the number 
of the output sentences based on the feeling information. 
[0014] The output sentence means back-channel feedback. 
[0015] The interactive processing device according to the present 
invention may further include storage means for storing the feeling 
inf ormation, in which the output sentence generating means generates 
the output sentence based on the feeling information stored in the 
storage means. 

[0016] The interactive processing device according to the present 
invention may further include output sentence outputting means for 
outputting the output sentence. 
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[0017] The output sentence outputting means may output the output 
sentence as synthetic tone. 

[0018] Further, the output sentence outputting means may control 
a rhythm of the synthetic tone based on the feeling information. 
[0019] An interactive processing method according to the present 
invention is characterized by including: a concept extracting step 
of extracting a concept of words and phrases inputted from a user; 
a feeling estimating step of estimating a feeling of the user based 
on the concept of the words and phrases inputted from the user to 
output feeling information expressing the feeling; and an output 
sentence generating step of generating an output sentence to be 
outputted to the user based on the feeling information. 
[0020] A recording medium according to the present invention is 
characterized by storing a program for causing a computer to execute 
an interactive processing for making an interaction with a user, 
the program comprising: a concept extracting step of extracting 
a concept of words and phrases inputted from a user; a feeling 
estimating step of estimating a feeling of the user based on the 
concept of the words and phrases inputted from the user to output 
feeling information expressing the feeling; and an output sentence 
generating step of generating an output sentence to be outputted 
to the user based on the feeling information. 

[0021] In the interactive processing device and method, and the 
recording medium of the present invention, the concept of the words 
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and phrases inputted from the user is extracted, and the feeling 
of the user is estimated based on the extracted concept. Then, the 
output sentence to be outputted to the user is generated based on 
the feeling information obtained as a result of the estimation. 
[0022] 

[Embodiment Mode of the Invention] Fig. 1 shows an example of a 
configuration of an embodiment of an interactive system (the system 
means logical assembly of a plurality of apparatuses and it does 
not matter for the system whether or not apparatuses having respective 
configurations are accommodated in the same chassis) to which the 
present invention is applied. 

[0023] A voice inputting portion 1, for example, is constituted 
by a microphone, an amplifier, and the like. The voice inputting 
portion 1 converts voice of a user into a sound signal as an electrical 
signal, amplifies the sound signal if necessary, and supplies the 
resultant sound signal to a voice recognizing portion 2. 
[0024] The voice recognizing portion 2 acoustically processes the 
sound signal from the voice inputting portion 1, and recognizes 
the voice of the user based on the acoustic processing results. 
The voice recognition results are supplied to an interaction 
controlling portion 3. In addition, the voice recognizing portion 
2 supplies rhythm information of the voice of the user, which is 
obtained by acoustically processing the sound signal to a user's 
feeling information updating portion 8. 
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[0025] The interaction controlling portion 3 generates contents 
of an output sentence to be outputted to the user as a response 
or the like to the voice recognition results from the voice recognizing 
portion 2 in consideration of the feeling information which expresses 
a feeling of the user and which is held (stored) in a user feeling 
information recording portion 9, and supplies content information 
expressing the contents to a sentence generating portion 4. In 
addition, the interaction controlling portion 3 extracts a concept 
of words and phrases contained in the voice recognition results 
from the voice recognizing portion 2, and words and phrases contained 
in the output sentence corresponding to content information 
generated by the interaction controlling portion 3 itself, and 
supplies concept information expressing the extracted concept to 
the user's feeling information updating portion 8. 
[0026] The sentence generating portion 4, while taking the feeling 
information held by the user feeling information recording portion 
9 into consideration, generates an output sentence in a text form 
corresponding to the content information from the interaction 
controlling portion 3, for example, and moreover generates a sound 
signal of a synthesis tone corresponding to the output sentence 
to supply the sound signal to a voice output portion 5. 
[0027] The voice outputting portion 5, for example, is constituted 
by an amplifier, a speaker, and the like. The voice outputting 
portion 5 amplifies the sound signal from the sentence generating 
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portion 4 if necessary, and outputs the resultant sound signal through 
the speaker, 

[0028] An image inputting portion 6, for example, is constituted 
by a lens, a CCD (Charge Coupled Device), an A/D converter, and 
the like. The image inputting portion 6 captures an image of a face 
or the like of the user and supplies face image information as digital 
data (image data) of a face image which is obtained as a result 
of the image capture to the user's feeling information updating 
portion 8. 

[0029] A physiological information inputting portion 7, for example, 
is constructed by a pulse rate meter, a. sensor for measuring an 
amount of perspiration, and a body temperature, and the like. The 
physiological information inputting portion 7 senses physiological 
phenomena such as a pulse rate, an amount of perspiration, and a 
body temperature of the user, and supplies the resultant 
physiological information to the user ' s feeling information updating 
portion 8. 

[0030] The user's feeling information updating portion 8 estimates 
a feeling of the user based on the rhythm information of the voice 
of the user from the voice recognizing portion 2, the concept 
information of the words and phrases contained in the voice 
recognition results or the like from the interaction controlling 
portion 3, the face image information from the image inputting portion 
6, and the physiological information from the physiological 
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information inputting portion 7. Moreover, the user's feeling 

information updating portion 8 updates feeling information held 

in the user feeling information recording portion 9 with, feeling 

information which is obtained as a result of the estimation. 

[ 0031 ] The user feeling information recording portion 9 holds feeling 

information in which feelings of joy, anger, surprise, and sorrow, 

for example, are expressed as feelings of the user in the form of 

numeric values falling within a predetermined range. 

[0032] Next, a flow of basic processings in the interactive system 

of Fig. 1 will be described by referring to a flow chart of Fig. 

2. 

[0033] When an utterance is made by a user, the voice inputting 
portion 1 subjects a voice of the utterance to a voice inputting 
processing in Step SI, and outputs the resultant sound signal to 
the voicerecognizingportion2 . That is, the voice inputtingportion 
1 converts the voice of the user into a sound signal as an electrical 
signal, amplifies the sound signal if necessary, and supplies the 
resultant sound signal to the voice recognizing portion 2. 
[0034] In Step S2, the voice recognizing portion 2 recognizes the 
voice of the user based on the sound signal from the voice inputting 
portion 2, and supplies the voice recognition results to the 
interaction controlling portion 3 . Moreover, the voice recognizing 
portion 2 extracts rhythm information of the voice of the user from 
the sound signal from the voice inputting portion 2, and supplies 
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the extracted rhythm information to the user's feeling information 
updating portion 8. 

[0036] Thereafter, the process proceeds to Step S3 in which a 
preparation process for updating the feeling information held in 
the user feeling information recording portion 9 is executed. 
[0036] That is, in Step S3, the interaction controlling portion 
3 executes a feeling information updating interaction controlling 
processing for obtaining the above-mentioned concept information 
used to update the feeling information based on the voice recognition 
results or the like for the voice of the user from the voice recognizing 
portion 2, and supplies the . resultant concept information to the 
user's feeling information updating portion 8. Moreover, in Step 
S3, the image inputting portion 6 executes an image inputting 
processing for capturing an image of a face of the user to obtain 
face image information, and supplies the resultant face image 
information to the user's feeling information updating portion 8. 
In addition, in Step S3, the physiological information inputting 
portion 7 executes a physiological information inputting processing 
for obtaining physiological information of the user, and supplies 
the resultant physiological information to the user's feeling 
information updating portion 8. 

{0037] In Step S4, the user's feeling information updating portion 
8 estimates a feeling of the user based on the rhythm information 
of the voice of the user from the. voice recognizing portion 2, the 
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concept information from the interaction controlling portion 3, 
the face image information from the image inputting portion 6, and 
the physiological information from the physiological information 
inputting portion 7. Moreover, in Step S4, the user's feeling 
information updating portion 8 updates the feeling information held 
in the user feeling information recording portion 9 with the feeling 
information obtained as a result of the estimation. 
[0038] Thereafter, in Step S5, the interaction controlling portion 
3 executes a sentence generating interaction controlling processing 
for generating content information expressing contents of an output 
sentence to be outputted to the user as a response or the like to 
the voice recognition results from the voice recognizing portion 
2 in consideration of the feeling information which expresses the 
feeling of the user and which is held (stored) in the user feeling 
information recording portion 9, and supplies the content 
information to the sentence generating portion 4 . 
[0039] Then, in Step S6, the sentence generating portion 4, while 
taking the feeling information held in the user feeling information 
recording portion 9 into consideration, generates an output sentence 
in a text form corresponding to the content information from the 
interaction controlling portion 3 (executes a sentence generating 
processing) , and moreover generates a sound signal of the synthetic 
tone corresponding to the output sentence to supply the resultant 
sound signal to the voice outputting portion 5. 
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[0040] In Step S7 , the voice outputting portion 5 executes a voice 
outputting processing for amplifying the sound signal from the 
sentence generating portion 4 to output the resultant sound signal 
through the speaker. Then, the operation is completed. 
[0041] Note that in the above-mentioned case, in the interactive 
system, an output of the synthetic tone (hereinafter also referred 
to as an utterance in the interactive system as appropriate) is 
triggered by any utterance made by a user. Thus, the synthetic tone 
becomes a response to the utterance made by the user. However, in 
the interactive system, the utterance of the interactive system 
may also be triggered by any action other than an utterance made 
by the user. 

[0042] That is, in the interactive system, for example, it is possible 
tomake the utterance every predetermined period of time . In addition, 
for example, when the face image of the user is obtained in the 
image inputting portion. 6 (including a case where the face image 
having a predetermined facial expression is obtained in addition 
to a case where the face image is simply obtained), or when 
predetermined physiological information is obtained in the 
physiological information inputting portion 7, it is possible to 
make the utterance. Moreover, for example, when a value of the 
feeling information held in the user feeling information recording 
portion 9 reaches a predetermined value or larger or smaller, it 
is also possible to make the utterance. In those cases, an 
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interaction is made such that the interactive system gives the 
utterance to the user, and the user responds to the utterance. 
[0043] Next, Fig. 3 shows an example of a configuration of the voice 
recognizing portion 2 of Fig. 1. 

[0044] The sound signal. f rom the voice inputting portion 1 is supplied 
to an A/D (Analog Digital) converting portion 11. The. A/D 
converting portion 11 converts the sound signal from the analog 
signal to a digital signal, and supplies the resultant sound data, 
to a feature extracting portion 12. The feature extracting portion 
12 subjects the sound data from the A/D converting portion 11 to 
the acoustic processing every arbitrary frame to extract a feature 
parameter such as a spectrum, a linear prediction coefficient, a 
cepstrum coefficient, a line spectrum pair, or MFCC (Mel Frequency 
Cepstrum Coefficient) , and supplies the extracted feature parameter 
to a matching portion 13. 

[0045] In addition, the feature extracting portion 12 supplies rhythm 
information such as an utterance speed, a pitch frequency, or a 
power which is obtained by subjecting the sound data to the acoustic 
processing to the user's feeling information updating portion 8. 
Note that a mora number or the like per frame, for example, can 
be used as the utterance speed. 

[0046] The matching portion 13 recognizes the voice (input voice) 
of the user based on the feature parameter supplied from the feature 
extracting portion 12 by referring to an acoustic model database 
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14, a dictionary database 15, and a grammar database 16 if necessary. 
[0047] That is, the acoustic model database 14 stores acoustic models 
expressing acoustic features such as individual phonemes and 
syllables in the language of the voice to be voice-recognized. Here, 
for example, HMM (Hidden Markov Model) and the like can be used 
as the acoustic models. The dictionary database 15 stores a word 
dictionary in which information related to pronunciations of words 
to be recognized. The grammar database 16 stores grammar rules that 
define the linkage (chain) between words registered in the word 
dictionary of the dictionary database 15. Here, for example, rules 
based on a contex-free grammar (CFG), HPSG (Head-driven Phrase 
Structure Grammar) , or statistical word sequence probability 
(N-gram) can be used as the grammar rules. 

[0048] The matching portion 13 connects the acoustic models stored 
in the acoustic model database 14 to construct the acoustic models 
(word models) of words by referring to the word dictionary of the 
dictionary database 15 . Moreover, the matching portion 13 connects 
several word models by referring to the grammar rules stored in 
the grammar database 16. Then, the matching portion 13 recognizes 
the voice of the user based on the feature parameters through the 
HMM method or the like for example, using the word models which 
are connected in such a manner. 

[0049] Then, the rhythm information as the voice recognition results 
given by the matching portion 13 is outputted in the form of a text 
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or a word graph for example, to the interaction controlling portion 
3. 

[0050] Fig. 4 shows an example of a configuration of the interaction 
controlling portion 3 of Fig. 1. 

[0051] The voice recognition results for the voice of the user 
outputted from the voice recognizing portion 2 are supplied to a 
language processing portion 21. The language processing portion 
21 processes the voice recognition results while referring to a 
thesaurus database 23 , a language processing database 24 , and a 
history database 25 if necessary, and supplies information on the 
meaning and concept expressed by the voice recognition results to 
the interaction processing portion 22. 

[0052] That is, the thesaurus in which the words are classified 
in the form of hierarchical structure in accordance with their 
concepts is stored in the thesaurus database 23. The language 
processing portion 21 recognizes the concepts of the words contained 
in the voice recognition results by referring to the thesaurus. 
[0053] Here, for example, "Word List by Semantic Principles" or 
the like which is published by The National Language Research 
Institute can be used as the thesaurus. 

[0054] A word dictionary in which notation, necessary word class 
information, and the like of words are described, and syntax/meaning 
rules in which restrictions on the word sequence are described based 
on information of the words described on the word dictionary are 
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stored in the language processing database 24. The language 
processing portion 21 carries out the morphological analysis of 
the voice recognition results inputted thereto based on the word 
dictionary and the syntax/meaning rules. Moreover, the language 
processing portion 21 carries out the syntax analysis of the voice 
recognition results, and analysis of the semantic contents based 
on the morphological analysis results. Then, the language 
processing portion 21 outputs the results of the analysis of the 
concepts of the words constituting the voice recognition results 
thus obtained, and the semantic contents of the voice recognition 
results (hereinafter collectively referred to as the language 
processing results as appropriate) to the interaction processing 
portion 22 . 

[0055] Here, the language processing portion 21 can carry out the 
syntax analysis and semantic content analysis using the regular 
grammar, the contex-free grammar, the HPSG or the statistical word 
sequence probability. 

[0056] In addition, the language processing portion 21 executes 
the processing with reference to the history database 25 as well 
if necessary. That is, the history (interaction history) of the 
interaction between the user and the interactive system is stored 
in the form of a set of voice recognition results of the voice uttered 
by the user and the interactive system's response to the utterance, 
or a set of an output of the interactive system and voice recognition 
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results of the voice uttered by the user for the output is stored 
in the history database 25 . The language processing portion 2 1 allows 
the omission of the subject or the like in the voice recognition 
results, and analyzes the anaphoric expression or the like by 
referring to the interaction history . Thus, the language processing 
portion 21, for example, recognizes what a pronoun contained in 
the voice recognition results for the voice of the user concretely 
means. 

[0057] Note that since the information stored in the thesaurus 
database 23 and the language processing database 24 is not basically 
updated, it can be said as so-called static information. On the 
other hand, since the interaction history stored in the history 
database 25 is updated by an interaction processing portion 22 which 
will be described later whenever the utterance is made by the user 
or the interactive system carries out any output to the user, it 
can be said as so-called dynamic information. 

[0058] As described above, the language processing portion 21 
extracts the concepts of the words (vocabularies) constituting the 
voice recognition results by referring to the thesaurus database 
23. When the concept expresses a feeling, the language processing 
portion 21 supplies the information on the concept expressing the 
feeling as the concept information to the user 1 s feeling information 
updating portion 8 . That is, when the word belonging to the concept 
expressing a feeling such as "joy", "anger", "surprise", "sorrow", 
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"pain", ."shame fulness" or "pleasure" on the thesaurus is contained 
in the voice recognition results, the language processing portion 
21 supplies the concept information expressing that concept to the 
user's feeling information updating portion 8. 

[0059] Note that the language processing portion 21 extracts the 
concept information of the words contained in the output of the 
interactive system stored as the interaction history in addition 
to the concept information of the words contained in the voice 
recognition results if necessary, and supplies the extracted 
information to the user's feeling information updating portion 8. 
[0060] That is, the user's feeling information updating portion 
8, as described above, estimates the feeling of the user. Then, 
the concept information of the words contained in the output of 
the interactive system as well as the concept information of the 
words contained in the voice recognition results are useful for 
the estimation in some cases. More specifically, for example, when 
such an utterance as to fool the user is made in the interactive 
system, it is anticipated that the user gets angry with the utterance . 
For this reason, the language processing portion 21 extracts the 
concept information as well of the words contained in the output 
of the interactive system stored as the interaction history by 
referring to the thesaurus, and supplies the extracted concept 
information together with the concept information of the words 
contained in the voice recognition results to the user's feeling 
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information updating portion 8. 

[0061] The interaction processing portion 22, while referring to 
the history database 25 and a scenario database 26, generates contents 
of an output sentence to be outputted to the user as a response 
or the like to the voice recognition results for the voice of the 
user based on the language processing results from the language 
processing portion 21, and the feeling information which expresses 
a feeling of the user and which is held in the user feeling information 
recording portion 9, and supplies the content information expressing 
the contents to the sentence generating portion 4, 
[0062] That is, the scenario database 26, for example, stores 
scenarios as rules of a pattern of an interaction with the user 
every task (topic) . The interactionprocessingportion 22 basically 
determines the scenario to be used in an interaction with the user 
from the scenarios stored in the scenario database 26 based on the 
language processing results from the language processing portion 
21, and generates the content information in accordance with the 
determined scenario. 

[0063] More specifically, for example, the following scenario is 
stored for an objective-oriented task such as programming a VCR. 
[0064] (action (Question (date, start_time, end_time, channell) ) ) 

(date ???) #date 

(start_time ???) #start time 

(end_time ???) tend time 
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(channel ???) #channel 

... (1) 

[0065] Here, when the language processing results from the language 
processing portion 21 expresses a request for picture recording, 
the interaction processing portion 22 generates such content 
information as to instruct a user to set a date when picture recording 
is carried out, a time when the picture recording is started, a 
time when the picture recording is ended, and a channel for which 
the picture recording in this order in accordance with the scenario 
(1) . 

[0066] In addition, the following scenario is stored as the scenario 
for a free interaction (so-called idle talk) for example. 
[0067] If X exist then speak (Y) #X: keyword, Y: response sentence 
(Money What do you want) # (X Y) 

(I want to eat something Do you feel hungry) 

... (2) 

[0068] Here, in accordance with the scenario (2),. when a keyword 
of "Money" is contained in the language processing results from 
the language processing portion 21, the interaction processing 
portion 22 generates the content information as a question of "What 
do you want". In addition, when the keyword of "I want to eat 
something" is contained in the language processing results from 
the language processing portion 21, the interaction processing 
portion 22 generates the content information as a question of "Do 
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you feel hungry". 

[0069] In addition, the interaction processing portion 22, for 
example, determines the scenario to be used based on not only the 
language processing results from the language processing portion 
21, but also the feeling information held in the user feeling 
information recording portion 9. That is, for example, in a case 
where the language processing results from the language processing 
portion 21 express that the user said hello, when the feeling 
information expresses that "pleasure" or "happiness" is at a normal 
level, or when the feeling information expresses that "anger" or 
"irritation" is at a high level, the interaction processing portion 
22 determines the use of the scenario. for simply giving a reply 
of "hello" to the user. In addition, for example, in a case where 
the language processing results from the language processing portion 
21 express that the user said hello, when the feeling information 
expresses that "pleasure" or "happiness" is at a high level, the 
interaction processing portion 22 determines the use of the scenario 
for putting a question of "did some happiness happen to you?" to 
the user. 

[0070] Note that in addition to the scenarios, the general knowledge 
required to interact with the user is also stored in the scenario 
database 26. That is, for example, when the language processing 
results from the language processing portion 21 express that the 
user said hello, the information for instructing the interactive 
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system to give a reply to the user is stored as the general knowledge 
in the scenario database 26. In addition, for example, topics or 
the like used in idle talk are also stored as the general knowledge 
in the scenario database 26. 

[0071] Moreover, the interaction processing portion 22 stores the 
language processing results from the language processing portion 
21, the content information generated by the interaction processing 
portion 22 itself, the information related to the scenarios used 
to generate the content information, and the like as the interaction 
history in the history database 25. 

[0072] In addition, the interaction processing portion 22 refers 
to the interaction history if necessary. Thus, for example, the 
interaction processing portion 22 also copes with a case where 
erroneous analysis of the voice recognition results, or erroneous 
semantic analysis thereof is detected later or the like. 
[0073] Next, Fig. 5 shows an example of a configuration of the sentence 
generating portion 4 of Fig. 1. 

[0074] The content information from the interaction controlling 
portion 3 is supplied to a text sentence generating portion 31. 
The text sentence generating portion 31 generates an output sentence 
in the text form corresponding (adapted) to the content information 
while referring to a dictionary database 34 and a generative grammar 
database 35 if necessary. 

[0075] That is, a word dictionary in which word class information 
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of words,, and information such as pronunciation and accents of the 
words are described is stored in the dictionary database 34. 
Templates of examples for an output sentence, declension rules for 
words necessary for generating an output sentence, and generating 
grammar rules such as restriction information of word order are 
stored in the generating grammar database 35 . Then, the text sentence 
generating portion 31 selects the template corresponding to the 
content information, and selects necessary words from the word 
dictionary. Moreover, the text sentence generating portion 31, 
while suitably changing the ending or the like of a word, fits the 
words to the template by referring to the generating grammar rules, 
thereby generating an output sentence corresponding to the content 
information. 

[0076] In addition, the feeling information held in the user feeling 
information recording portion 9 is also supplied to the text sentence 
generating portion 31. The text sentence generating portion 31 
changes the expression of the output sentence based on the feeling 
information supplied thereto. That is, the templates which are 
identical in contents to one another but are different in expression 
from one another are stored in the generating grammar database 35. 
The text sentence generating portion 31 selects a template having 
a predetermined expression from such templates having the same 
contents, based on the feeling information. In addition, the text 
sentence generating portion 31 selects the words each having a 
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predetermined expression also for the words to be fitted to the 
template, based on the feeling information. Moreover, the text 
sentence generating portion 31 also changes the ending or the like 
of a word, based on the feeling information. 

[0077] Thus, for example, when the feeling information expresses 
that "anger" or "sorrow" is at a high level, the text generating 
portion 31 generates an output sentence having a relatively polite 
expression. In addition, for example, when the feeling information 
expresses that "pleasure" or "joy" is at a high level, the text 
generating portion 31 generates an output sentence having a so-called 
rough expression. 

[0078] Note that in addition to the method using templates, for 
example, a method based on a case structure or the like may also 
be adopted as a method of generating an output sentence. 
[0079] After outputting an output sentence, the text sentence 
generating portion 31 carries out the morphological analysis, the 
syntax analysis, and the like to extract information necessary for 
voice rule synthesis carried out in a rule synthesizing portion 
32 in a subsequent stage. Here, as for the information necessary 
for the voice rule synthesis, for example, there are the information 
for controlling a timing of the pause, an accent, and intonation, 
other rhythm information, rhythm information containing 
pronunciations of words, and the like. 

[0080] The information obtained in the text sentence generating 
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portion 31 is supplied to the rule synthesizing portion 32. The 
rule synthesizing portion 32 generates the sound data (digital data) 
of synthetic tone corresponding to the output sentence generated 
in the text sentence generating portion 31 using a phoneme database 
36. 

[0081] That is f for example, phoneme. data is stored in the form 
of CV (Consonant, Vowel) , VCV, CVC or the like in the phoneme database 
36 . The rule synthesizing portion 32 combines the necessary phoneme 
data based on the information from the text sentence generating 
portion 31. The rule synthesizing portion 32 suitably applies the 
pause, the accent, the intonation, and the like to generate sound 
data of the synthetic sound corresponding to the output sentence 
generated in the text sentence generating portion 31. 
[0082] In addition, the feeling information held in the user feeling 
information recording portion 9 is supplied to the rule synthesizing 
portion 32. The rule synthesizing portion 32 controls the rhythm 
information such as the pause, the accent, the intonation, the 
utterance speed, and a pitch frequency applied to the combined phoneme 
data based on the feeling information. That is, for example, when 
the feeling information expresses that the user is excited, the 
rule synthesizing portion 32 generates the sound data of a synthetic 
sound having a slow and calm tune. In addition, for example, when 
the feeling information expresses that the user seems pleasant, 
the rule synthesizing portion 32 also generates the sound data of 
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a synthetic sound having a pleasant tune. 

[0083] Note that the details of the relationship between the feeling 
and the voice are described in Maekawa : "Transmission of Paralanguage 
Information Through Voice; From a Viewpoint of Linguistics", 
Proceedings of the (1997) Autumn Meeting of the Acoustical Society 
of Japan, 1-3-10, pp. 381 to 384, September, 1997, or the like. 
[0084] The sound data of the synthetic sound obtained in the rule 
synthesizing portion 32 is supplied to a D/A (Digital Analog) 
converting portion 33, and the D/A converting portion 33 converts 
the sound data supplied thereto into the sound signal as an analog 
signal. The resultant sound signal is supplied to the voice 
outputting portion 5 from which a synthetic sound corresponding 
to the output sentence generated in the text sentence generating 
portion 31 is in turn outputted. 

[0085] Next, Fig. 6 shows an example of a configuration of the user's 
feeling information updating portion 8 of Fig. 1. 
[0086] The rhythm information outputted by the voice recognizing 
portion 2, the concept information outputted by the interaction 
controlling portion 3, the face image information outputted by the 
image inputting portion 6, and the physiological information 
outputted by the physiological information inputting portion 7 are 
supplied to a rhythm information processing portion 41, a concept 
information processing portion 42, an image information processing 
portion 43 , and a physiological information processing portion 44, 
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respectively. 

[0087] The rhythm information processing portion 41 processes the . 
rhythm information supplied thereto to estimate a feeling of the 
user, and outputs the feeling information as the estimation results 
to an update processing portion 45. 

[0088] Note that for example, a method or the like described in 
JP 10-55194 A can be used as a method of estimating a feeling of 
a user based on the rhythm information of a voice of the user. 
[0089] The concept information processing portion 42 processes the 
concept information supplied thereto to estimate a feeling of the 
user, and outputs feeling information as the estimation results 
to the update processing portion 45 . That is, the concept information 
processing portion 42 measures an appearance frequency at which 
words belonging to the concept expressing feelings such as "joy" 
and "anger" appears in the interaction between the user and the 
interactive system, based on the concept information. Then, the 
concept information processing portion 42 estimates a feeling of 
the user based on the appearance frequency, and outputs feeling 
information as the estimation results to the update processing 
portion 45. 

[0090] The image information processing portion 43 processes the 
face image information supplied thereto to estimate a feeling of 
the user, and outputs feeling information as the estimation results 
to the update processing portion 45. 
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[0091] That is, Fig. 7 shows an example of a configuration of the 
image information processing portion 43 of Fig. 6. 
[ 0092 ] The face image information is supplied to a feature extracting 
portion 51, and the feature extracting portion 51 extracts feature 
quantities of the face image information'. That is, the feature 
extracting portion 51, for example, wavelet-converts the face image 
information to obtain a feature vector having a coefficient 
expressing a spatial frequency component as its component, and 
supplies information on the resultant feature vector to a vector 
quantization portion 52. 

[0093] The vector quantization portion 52 vector-quantizes the 
feature vector from the feature extracting portion 51 in accordance 
with a code book stored in a code book database 54 to obtain a 
one-dimensional symbol (column) . 

[0094] That is, the code books which are obtained by carrying out 
learning using images of a face in such feeling as joy, anger, surprise, 
and sorrow are stored in the code book database 54. Note that in 
this example, in order to enhance the quantization precision, the 
code books for individual feelings such as a joy code book, and 
an anger code book are created to be stored in the code book database 
54. 

[0095] Then, the vector quantization portion 52 vector-quantizes 
the feature vector from the feature extracting portion 51 in 
accordance with the code books for individual feelings stored in 
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the code book database 54 to obtain a symbol (a code assigned to 
a code vector of the code book), and outputs the resultant symbol 
to a matching portion 53. Consequently, the symbol as the vector 
quantization results for individual feelings is supplied to* the 
matching portion 53. 

[0096] The matching portion 53 carries out the matching in order 
to determine which of faces in a joyful feeling, an angry feeling, 
a surprising feeling, and a sorrowful feeling, for example, 
corresponds to the face image information using the symbol from 
the vector quantization portion 52 by referring to an HMM database 
55. 

[0097] That is, models (HMM) about the faces in the individual feelings 
which are obtained by carrying out the learning using the images 
of the face in such feeling as joy, anger, surprise, and sorrow 
are stored in the HMM database 55. 

[0098] Then, the matching portion 53 obtains the model with the 
highest probability at which the symbol series obtained from the 
vector quantization portion 52 is observed by utilizing a Viterbi 
method. Moreover, the matching portion 53 estimates the feeling 
corresponding to that model as a feeling of the user, and outputs 
the feeling information as the estimation results to the update 
processing portion 45. 

[0099] Here, the calculation for the probability at which the symbol 
series obtained from the vector quantization portion 52 is observed 
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is carried out every feeling in the matching portion 53. That is, 
for example, the calculation for the probability of the observation 
of the symbol series obtained by carrying out the vector quantization 
using the joy code book, for example is carried out using the HMM 
(joy HMM) obtained by carrying out the leaning using the image of 
the face in the joyful feeling. In addition, the calculation for 
the probability of the observation of the symbol series obtained 
by carrying out the vector quantization using the anger code book, 
for example is carried out using the HMM (anger HMM) obtained by 
carrying out the leaning using the image of the face in the angry 
feeling. 

[0100] Note that the details of the method of estimating a feeling 
from face image information in the manner as described above are 
described in, for example, Sakaguchi, Ohya, and Kishino : "Facial 
Expression Recognition from Image Sequence using Hidden Markov 
Model'', The Journal of the Institute of Television Engineers of 
Japan, Vol. 49, No. 8, pp. 1060 to 1067, August, 1995, or the like. 
[0101] A method described in Sakaguchi, Morishima: "Real-time Basic 
Facial Expression Recognition based on Spatial Frequency 
Information", Proceedings of the 2-nd Symposium on Intelligence 
Information Media, pp. 75-82, December, 1996, or the like can also 
be adopted as the method of estimating a feeling from face image 
information. 

[0102] Referring back to Fig. 6, the physiological information 
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processing portion 44 processes the supplied physiological 
information to estimate a feeling of the user, and outputs the feeling 
information as the estimation results to the update processing 
portion 45. Here, as for a method of estimating a feeling of the 
user from the physiological information, for example, there is known 
a method in which a function expressing a correlation between feelings 
and physiological information such as a pulse count or an amount 
of perspiration is statistically obtained in advance, and a feeling 
of a user is estimated using the resultant function, or the like. 
[0103] The update processing portion 45 obtains a final update value 
with which the feeling information held in the user feeling 
information recording portion 9 is to be updated using synthetically 
the feeling information from the rhythm information processing 
portion 41, the concept information processing portion 42, the image 
informationprocessingportion 43, and the physiological information 
processing portion 44, and updates the feeling information held 
in the user feeling information recording portion 9 with the resultant 
update value. That is, the update processing portion 45 executes, 
for example, weighted-addition and normalization of the feeling 
information corresponding to the respective feelings from the rhythm 
information processing portion 41, the concept information 
processing portion 42, the image information processing portion 
43, and the physiological information processing portion 44 to derive 
final feeling information corresponding to the respective feelings. 
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Then, the update processing portion 45 updates the feeling 
information held in the user feeling information recording portion 
9 with the final feeling information. 

[0104] Here, Fig. 8 shows the feeling information held in the user 
feeling information recording portion 9. For the feeling 
information corresponding to the respective feelings, the strength 
of the feeling is expressed by an actual number from 0 to 1, for 
example. Thus, the larger number means the stronger feeling (the 
smaller the actual number, the weaker the feeling is).. The update 
processing portion 45 updates the actual number as such feeling 
information every feeling. 

[0105] Next, processings (feeling information updating processings) 
in the user's feeling information updating portion 8 of Fig. 6 will 
be described by referring to a flow chart shown in Fig. 9. 
[0106] First of all, in Step Sll, the rhythm information processing 
portion 41, the concept information processing portion 42, the image 
informationprocessingportion 43, and the physiological information 
processing portion 44 estimate a feeling of the user in the manner 
as described above, and output the feeling information as the 
-estimation results to the update processing portion 45. 
[0107] In Step S12, the update processing portion 45 obtains the 
final update value with which the feeling information held in the 
user feeling information recording portion 9 is to be updated using 
synthetically the feeling information from the rhythm information 
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processing portion 41, the concept information processing portion 
42, the image information processing portion 43, and the 
physiological information processing portion 44 . Then, the process 
proceeds to Step S13. In Step S13, the update processing portion 
45 updates the feeling information held in the user feeling 
information recording portion 9 with the update value. Thus, the 
processings are completed. 

[0108] Next, the above-mentioned series of processings can be 
executed by use of dedicated hardware or software. When the series 
of processings are executed by use of the software, a program 
constituting the software is installed in a general purpose computer 
or the like. 

[0109] Then, Fig. 10 shows an example of a configuration of an 
embodiment of the computer in which the program for executing the 
above-mentioned series of processings is installed. 
[0110] The program can be previously recorded on a hard disc 105 
or a ROM 103 as a recording medium incorporated in the computer. 
[0111] Alternatively, the program can be temporarily or permanently 
stored (recorded) in a removal recording medium 111 such as a floppy 
disc, a CD-ROM (Compact Disc Readonly Memory) , anMO (Magneto Optical) 
disc, a DVD (Digital Versatile Disc), a magnetic disc, or a 
semiconductor memory. Such a removal recording medium 111 can be 
presented as so-called package software. 

[0112] Note that the program can be installed in the computer from 
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the removal recording medium 111 as described above, and in addition, 
can be wirelessly transferred from a download site to the computer 
through an artificial satellite for digital satellite broadcasting, 
or can be wiredly transferred to the computer through a network 
such as a LAN (Local Area Network) or the Internet. In the computer, 
the program which is transferred thereto in such a manner can be 
received by a communication portion 108 and can be installed in 
the hard disc 105 incorporated therein. 

[0113] The computer incorporates a CPU (Central Processing Unit) 
102. An I/O interface 110 is connected to the CPU 102 through a 
bus 101. When a command is inputted to the CPU 102 through 
manipulation of an input portion 107 constituted by a keyboard, 
a mouse or the like by the user, the CPU 102 executes the program 
stored in a ROM (Read Only Memory ) 103 in accordance with the command. 
Besides, the CPU 102 loads a program stored in the hard disc 105, 
a program which is transferred through a satellite or network 
communication and then received by the communication portion 108 
to be installed in the hard disc 105, or a program which is read 
out from the removal recording medium 111 inserted to a drive 109 
to be installed in the hard disc 105 into a RAM (Random Access Memory) 
104 to execute the program. Thus, the CPU 102 executes processings 
in accordance with the above-mentioned flow chart, or processing's 
carried out with the configuration of the above-mentioned block 
diagram. Then, the CPU 102, if necessary, outputs the processing 
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-results with an output portion 106 constituted by a liquid crystal 
display (LCD), a speaker or the like, or transmits the processing 
results from the communication portion 108, or records the processing 
results on the hard disc 105 through the I/O interface 110. 
[0114] Here, in this specification, the processing steps described 
in the program with which the computer is instructed to execute 
various kinds of processings are not necessarily executed in a time 
series manner in accordance with the flow of the flow chart. Thus, 
the processing steps also include processings which are executed 
in parallel or individually (e.g., parallel processings or 
object-based processings) . 

[0115] In addition, the program may be executed by a single computer 
or may be distributedly executed by a plurality of computers. 
Moreover, the program may also be transferred to a remote computer 
to be executed by the remote computer. 

[0116] As described above, since a feeling of the user is estimated 
based on at least the concept of words and phrases contained in 
the voice recognition results for the voice of the user, the feeling 
of the user can be estimated with relatively high precision . Moreover, 
since a feeling of the user is estimated based on the rhythm 
information, the face image information, and the physiological 
information in addition thereto, the feeling of the user can be 
estimated more precisely. Furthermore, since the output sentence 
is generated based on such feeling estimation results, the output 
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sentence can be presented to the user in a wide variety of forms 
depending on the feelings of the user. 

[0117] In this embodiment, the voice recognition is carried out 
for the sound (voice) inputted to the voice inputting portion 1. 
However, the voice recognition may not be carried out for the sound 
inputted to the voice inputting portion 1. In this case, for example, 
the sound may be detected as a sound of tapping on a desk with user's 
fingers, or a sound of breathing of a user, and a feeling of the 
user may also be estimated based on the detection results. That 
is, for example, when it is continuously detected that a desk is 
tapped, it is possible to estimate that the user gets angry. In 
addition, for example, when it is detected that the user breathes 
hard, it is possible to estimate that the user is excited. In this 
case, it is possible to apply such an ad hoc update rule as to increase 
the value of the feeling information expressing "anger" or 
"excitement" based on such estimation results. 

[0118] Moreover, in the interaction controlling portion 3, the number 
of times of utterance to the user can be changed by controlling 
the number of times of generation of the output sentence in 
correspondence to a user 1 s feeling. More specif ically, for example, 
when the user seems pleasant, for example, the number of times of 
the back-channel feedback is increased, and in addition thereto, 
the number of times of the utterance from the interactive system 
is increased . Thus , it is possible to positively make the interaction 
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with the user. In addition, for example, when the user seems 
sorrowful, the number of times of the utterance from the interactive 
system is decreased. Thus, it is possible to prevent the user from 

i 

feeling troublesomeness . 

[0119] In addition, in this embodiment, the voice from the user 
is recognized, and the utterance is made in response to the voice 
recognition results. Besides, for example, a response may also be 
made to a sentence inputted through the user's manipulation of a 
keyboard. 

[0120] Moreover, in this embodiment, the response or the like to 
the user is outputted in the form of the synthetic tone . In addition 
thereto, for example, the response or the like to the user may also 
be displayed in the form of a text or the like. 
[0121] Inaddition, the present invention can be used as, for example, 
a user interface between virtual characters displayed on a display 
device, or a physical robot or the like and the user. In this case, 
as the response or the like to the user, in addition to the output 
of the synthetic tone as described above, a display state of virtual 
characters is changed or a robot is made to carry out a predetermined 
operation. Thus, it is possible to realize a multi-modal interface. 
[0122] 

[Effects of the Invention] According to the interactive processing 
device and method, and the recording medium of the present invention, 
the concept of the words and phrases outputted from the user is 
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extracted, and a feeling of the user is estimated based on the 
extracted concept. Then, the output sentence to be outputted to 
the user is generated based on the resultant feeling information. 
Consequently, the interaction can be made in a wide variety of forms 
depending on feelings of the user, for example. 
[Brief Description of the Drawings] 

[Fig. 1] A block diagram showing ah example of a configuration of 
an embodiment of an interactive system to which the present invention 
is applied. 

[Fig. 2] A flow chart for explaining processings in the interactive 
system of Fig. 1. 

[Fig. 3] A block diagram showing an example of a configuration of 
a voice recognizing portion 2 of Fig. 1. 

[Fig. 4] A block diagram showing an example of a configuration of 
an interaction controlling portion 3 of Fig. 1. 
[Fig. 5] A block diagram showing an example of a configuration of 
a sentence generating portion 4 of Fig. 1. 

[Fig. 6] A block diagram showing an example of a configuration of 
a user's feeling information updating portion 8 of Fig. 1. 
[Fig. 7] A block diagram showing an example of a configuration of 
an image information processing portion 43 of Fig. 6. 
[Fig. 8] A diagram showing feeling information. 

[Fig. 9] A flow chart for explaining processings in a user's feeling 
information updating portion 8 of Fig. 6. 
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[Fig. 10] A block diagram showing an example of a configuration 
of an embodiment of a computer to which the present invention is 
applied. 

[Description of Reference Numerals] 

1, voice inputting portion; 2, voice recognizing portion; 3, 
interaction controlling portion; 4, sentence generating portion; 
5, voice outputting portion; 6, image inputting portion; 7, 
physiological information inputting portion; 8, user's feeling 
information updating portion; 9, user feeling information recording 
portion; 11, A/D converting portion; 12, feature extracting portion; 
13, matching portion; 14, acoustic model database; 15, dictionary 
database; 16, grammar database; 21, language processing portion; 
22, interaction processing portion; 23, thesaurus database; 24, 
language processing database; 25, history database; 26, scenario 
database; 31, text sentence generating portion; 32, rule 
synthesizing portion; 33, D/A converting portion; 34, dictionary 
database; 35, generating grammar database; 36, phoneme database; 
41, rhythm information processing portion; 42, concept information 
processing portion; 43, image information processing portion; 44, 
physiological information processing portion; 51, feature 
extracting portion; 52, vector quantization portion; 53, matching 
portion; 54, code book database; 55, HMM database; 101, bus; 102, 
CPU; 103, ROM; 104, RAM; 105, hard disc; 106, output portion; 107, 
input portion; 108, communication portion; 109, drive; 110, I/O 
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interface; and 111, removal recording medium 
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FIG. 1 
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FIG. 2 
START 

51 VOICE INPUT PROCESSING 

52 VOICE RECOGNITION PROCESSING 
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S7 VOICE OUTPUT PROCESSING 
END 

FIG. 3 
VOICE INPUT 

2 VOICE RECOGNIZING PORTION 

11 A/D CONVERTING PORTION 

12 FEATURE EXTRACTING PORTION 

13 MATCHING PORTION 

14 ACOUSTIC MODEL DATABASE 

15 DICTIONARY DATABASE 

16 GRAMMAR DATABASE 
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VOICE RECOGNITION RESULTS 

FIG. 4 

VOICE RECOGNITION RESULTS 
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CONTENT INFORMATION 

23 THESAURUS DATABASE 

24 LANGUAGE PROCESSING DATABASE 
SYNTAX/SEMANTIC RULE WORD DICTIONARY 

25 HISTORY DATABASE 
INTERACTION HISTORY 
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FIG. 5 

RESPONSE CONTENTS 

4 SENTENCE GENERATING PORTION 

31 TEXT SENTENCE GENERATING PORTION 

32 RULE SYNTHESIZING PORTION 

33 D/A CONVERTING PORTION 
SOUND DATA 

34 DICTIONARY DATABASE 

35 GENERATING GRAMMAR DATABAE 

36 PHONEME DATABASE 

FIG. 6 

RHYTHM INFORMATION 
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FACE IMAGE INFORMATION 
PHYSIOLOGICAL INFORMATION 

8 USER'S FEELING INFORMATION UPDATING PORTION 

41 RHYTHM INFORMATION PROCESSING PORTION 

42 CONCEPT INFORMATION PROCESSING PORTION 

43 IMAGE INFORMATION PROCESSING PORTION 

44 PHYSIOLOGICAL INFORMATION PROCESSING PORTION 
4 5 UPDATE PROCESSING PORTION 

UPDATE VALUE 

FIG. 7 
IMAGE INPUT 

4 3 IMAGE INFORMATION PROCESSING PORTION 

51 FEATURE EXTRACTING PORTION 

52 VECTOR QUANTIZATION PORTION 
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FIG. 8 
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FIG. 9 

FEELING INFORMATION UPDATING PROCESSING 
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FIG. 10 
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109 DRIVE 
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(57)Abstract: 

PROBLEM TO BE SOLVED: To conduct interactive 
operations having rich variations depending on the 
feeling condition of a user. 

SOLUTION: In a voice recognition section 2, user's 
voice is recognized and phoneme information of the 
voice is extracted. In an interactive control section 3, 
conceptual information of the words and the phrases 
included in the voice recognition result obtained by the 
section 2 is extracted. An image inputting section 6 
photographs the face of the user and outputs face 
image information. In a physiological information 
inputting section 7, physiological information such as 
the pulse rate of the user is detected. Then, a user 
feeling information updating section 8 estimates the 
feeling of the user based on the phoneme, the 
conceptual, the face image and the physiological 
information. In the section 3 and a sentence generating 
section 4, an output sentence is generated and 
outputted to the user based on the estimated result of 
the feeling. 
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